34 research outputs found

    Competitive function approximation for reinforcement learning

    Get PDF
    The application of reinforcement learning to problems with continuous domains requires representing the value function by means of function approximation. We identify two aspects of reinforcement learning that make the function approximation process hard: non-stationarity of the target function and biased sampling. Non-stationarity is the result of the bootstrapping nature of dynamic programming where the value function is estimated using its current approximation. Biased sampling occurs when some regions of the state space are visited too often, causing a reiterated updating with similar values which fade out the occasional updates of infrequently sampled regions. We propose a competitive approach for function approximation where many different local approximators are available at a given input and the one with expectedly best approximation is selected by means of a relevance function. The local nature of the approximators allows their fast adaptation to non-stationary changes and mitigates the biased sampling problem. The coexistence of multiple approximators updated and tried in parallel permits obtaining a good estimation much faster than would be possible with a single approximator. Experiments in different benchmark problems show that the competitive strategy provides a faster and more stable learning than non-competitive approaches.Preprin

    Stochastic approximations of average values using proportions of samples

    Get PDF
    IRI Technical ReportIn this work we explain how the stochastic approximation of the average of a random variable is carried out when the observations used in the updates consist in proportion of samples rather than complete samples.Preprin

    A competitive strategy for function approximation in Q-learning

    Get PDF
    In this work we propose an approach for generalization in continuous domain Reinforcement Learning that, instead of using a single function approximator, tries many different function approximators in parallel, each one defined in a different region of the domain. Associated with each approximator is a relevance function that locally quantifies the quality of its approximation, so that, at each input point, the approximator with highest relevance can be selected. The relevance function is defined using parametric estimations of the variance of the q-values and the density of samples in the input space, which are used to quantify the accuracy and the confidence in the approximation, respectively. These parametric estimations are obtained from a probability density distribution represented as a Gaussian Mixture Model embedded in the input-output space of each approximator. In our experiments, the proposed approach required a lesser number of experiences for learning and produced more stable convergence profiles than when using a single function approximator.Peer ReviewedPreprin

    Efficient interactive decision-making framework for robotic applications

    Get PDF
    © . This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/The inclusion of robots in our society is imminent, such as service robots. Robots are now capable of reliably manipulating objects in our daily lives but only when combined with artificial intelligence (AI) techniques for planning and decision-making, which allow a machine to determine how a task can be completed successfully. To perform decision making, AI planning methods use a set of planning operators to code the state changes in the environment produced by a robotic action. Given a specific goal, the planner then searches for the best sequence of planning operators, i.e., the best plan that leads through the state space to satisfy the goal. In principle, planning operators can be hand-coded, but this is impractical for applications that involve many possible state transitions. An alternative is to learn them automatically from experience, which is most efficient when there is a human teacher. In this study, we propose a simple and efficient decision-making framework for this purpose. The robot executes its plan in a step-wise manner and any planning impasse produced by missing operators is resolved online by asking a human teacher for the next action to execute. Based on the observed state transitions, this approach rapidly generates the missing operators by evaluating the relevance of several cause–effect alternatives in parallel using a probability estimate, which compensates for the high uncertainty that is inherent when learning from a small number of samples. We evaluated the validity of our approach in simulated and real environments, where it was benchmarked against previous methods. Humans learn in the same incremental manner, so we consider that our approach may be a better alternative to existing learning paradigms, which require offline learning, a significant amount of previous knowledge, or a large number of samples.Peer ReviewedPostprint (author's final draft

    A general strategy for interactive decision-making in robotic platforms

    Get PDF
    This work presents an intergated strategy for planning and learning suitable to execute tasks with robotic platforms without any previous task specification. The approach rapidly learns planning operators from few action experiences using a competitive strategy where many alternatives of cause-effect explanations are evaluated in parallel, and the most successful ones are used to generate the operators. The system operates without task interruption by integrating in the planning-learning loop a human teacher that supports the planner in making decisions. All the mechanisms are integrated and synchronized in the robot using a general decision-making framework.Preprin

    Integrating task planning and interactive learning for robots to work in human environments

    Get PDF
    Human environments are challenging for robots, which need to be trainable by lay people and learn new behaviours rapidly without disrupting much the ongoing activity. A system that integrates AI techniques for planning and learning is here proposed to satisfy these strong demands. The approach rapidly learns planning operators from few action experiences using a competitive strategy where many alternatives of cause-effect explanations are evaluated in parallel, and the most successful ones are used to generate the operators. The success of a cause-effect explanation is evaluated by a probabilistic estimate that compensates the lack of experience, producing more confident estimations and speeding up the learning in relation to other known estimates. The system operates without task interruption by integrating in the planning-learning loop a human teacher that supports the planner in making decisions. All the mechanisms are integrated and synchronized in the robot using a general decision-making framework. The feasibility and scalability of the architecture are evaluated in two different robot platforms: a Stäubli arm, and the humanoid ARMAR III.Peer ReviewedPostprint (author’s final draft

    Quick learning of cause-effects relevant for robot action

    Get PDF
    In this work we propose a new paradigm for the rapid learning of cause-effect relations relevant for task execution. Learning occurs automatically from action experiences by means of a novel constructive learning approach designed for applications where there is no previous knowledge of the task or world model, examples are provided on-line during run time, and the number of examples is small compared to the number of incoming experiences. These limitations pose obstacles for the existing constructive learning methods, where on-line learning is either not considered, a significant amount of prior knowledge has to be provided, or a large number of experiences or training streams are required. The system is implemented and evaluated in a humanoid robot platform using a decision-making framework that integrates a planner, the proposed learning mechanism, and a human teacher that supports the planner in the action selection. Results demonstrate the feasibility of the system for decision making in robotic applications.Preprin

    On-line learning of macro planning operators using probabilistic estimations of cause-effects

    Get PDF
    In this work we propose an on-line learning method for learning action rules for planning. The system uses a probabilistic approach of a constructive induction method that combines a beam search with an example-based search over candidate rules to find those that more concisely describe the world dynamics. The approach permits a rapid integration of the knowledge acquired from experience. Exploration of the world dynamics is guided by the planner, and – if the planner fails because of incomplete knowledge – by a teacher through action instructions

    La renovación de la palabra en el bicentenario de la Argentina : los colores de la mirada lingüística

    Get PDF
    El libro reúne trabajos en los que se exponen resultados de investigaciones presentadas por investigadores de Argentina, Chile, Brasil, España, Italia y Alemania en el XII Congreso de la Sociedad Argentina de Lingüística (SAL), Bicentenario: la renovación de la palabra, realizado en Mendoza, Argentina, entre el 6 y el 9 de abril de 2010. Las temáticas abordadas en los 167 capítulos muestran las grandes líneas de investigación que se desarrollan fundamentalmente en nuestro país, pero también en los otros países mencionados arriba, y señalan además las áreas que recién se inician, con poca tradición en nuestro país y que deberían fomentarse. Los trabajos aquí publicados se enmarcan dentro de las siguientes disciplinas y/o campos de investigación: Fonología, Sintaxis, Semántica y Pragmática, Lingüística Cognitiva, Análisis del Discurso, Psicolingüística, Adquisición de la Lengua, Sociolingüística y Dialectología, Didáctica de la lengua, Lingüística Aplicada, Lingüística Computacional, Historia de la Lengua y la Lingüística, Lenguas Aborígenes, Filosofía del Lenguaje, Lexicología y Terminología

    A competitive strategy for function approximation in Q-learning

    No full text
    In this work we propose an approach for generalization in continuous domain Reinforcement Learning that, instead of using a single function approximator, tries many different function approximators in parallel, each one defined in a different region of the domain. Associated with each approximator is a relevance function that locally quantifies the quality of its approximation, so that, at each input point, the approximator with highest relevance can be selected. The relevance function is defined using parametric estimations of the variance of the q-values and the density of samples in the input space, which are used to quantify the accuracy and the confidence in the approximation, respectively. These parametric estimations are obtained from a probability density distribution represented as a Gaussian Mixture Model embedded in the input-output space of each approximator. In our experiments, the proposed approach required a lesser number of experiences for learning and produced more stable convergence profiles than when using a single function approximator.Peer Reviewe
    corecore